Module 1

By Yiqiao Zhang, Jia Liu, Jianxiong Wang, Xinjie Ye

Part I: Data Description

The dataset consists of 17 variables with a sample space of 252 men. The 17 variables are INDO (index), Percentage of body fat (%), Body density from underwater weighing (gm/cm^3), Age (year), Weight (lbs), Height (inches), Adioposity (bmi) and ten Body Circumferences (Neck, Chest, Abdomen, Hip, Thigh, Knee, Ankle, Biceps, Forearm, Wrist, all in units of cm). Percentage of body fat is given from Siri's (1956) equation:

$$BodyFat\ \% = \frac{495}{Density}\ –\ 450$$

Part II: Thesis Statement

To accurately estimate bodyfat with clinical measurements, we used several criterions to select important variables. After model diagonsis, we transformed independent variables to better satisfy linear model assumptions.

We found out that among all 14 variables, the linear function of WEIGHT, WEIGHT transformation, ABDOMEN, FOREARM, and WRIST can best interpret bodyfat.

Data Cleaning

General demographic information

In [74]:
library("MASS")
library("car")
library(plotly)
data = read.csv("Bodyfat.csv", header = TRUE)[,-1]
attach(data)
In [75]:
plot_ly(data, x = ~BODYFAT, type="histogram")

We can see that there is one point that 0% bodyfat, which is impossible; also, the maximum value of bodyfat is abnormal. Now, let's check these two points:

In [23]:
data[which(BODYFAT == 0), ]
data[which(BODYFAT == max(data$BODYFAT)), ]
bad <- NULL
bad <- which(BODYFAT == 0)
BODYFATDENSITYAGEWEIGHTHEIGHTADIPOSITYNECKCHESTABDOMENHIPTHIGHKNEEANKLEBICEPSFOREARMWRIST
1820 1.108940 118.5 68 18.1 33.8 79.3 69.4 85 47.2 33.5 20.2 27.7 24.6 16.5
BODYFATDENSITYAGEWEIGHTHEIGHTADIPOSITYNECKCHESTABDOMENHIPTHIGHKNEEANKLEBICEPSFOREARMWRIST
21645.1 0.99551 219 64 37.6 41.2 119.8122.1112.862.5 36.9 23.6 34.7 29.1 18.4

We tried to predict bodyfat No.182 with density by Siri's equation, but the prediction is negative, so we considered the observation has missing dependent veriable, so we rule it out from model.

In addition, the Maximums of Weight, Neck Cir., Chest Cir., Abdomen Cir., Hip Cir., Thigh Cir., Knee Cir., Biceps Cir., Wrist Cir. are all from the same observation, which is the 39th. We now set it aside for further study.

We conclude that the record of 182 is not valid.

General Demographic Information

We fit full linear model without No.182 record:

In [24]:
m1 <- lm(BODYFAT ~ ., data = data[-bad, -2])
In [25]:
plot(m1, which = 4)
abline(h = 4/(nrow(data)-ncol(data)), lty = 2)
In [26]:
data[c(39, 42, 86),]
bad <- c(bad,42)
BODYFATDENSITYAGEWEIGHTHEIGHTADIPOSITYNECKCHESTABDOMENHIPTHIGHKNEEANKLEBICEPSFOREARMWRIST
3933.8 1.020246 363.1572.25 48.9 51.2 136.2 148.1 147.7 87.3 49.1 29.6 45.0 29.0 21.4
4231.7 1.025044 205.0029.50 29.9 36.6 106.0 104.3 115.5 70.6 42.5 23.7 33.6 28.7 17.4
8625.8 1.038667 167.0067.50 26.0 36.5 98.9 89.7 96.2 54.7 37.8 33.7 32.4 27.7 18.2

We conclude that the record of 42 may be not valid.

Consistence of DENSITY versus BODYFAT

In [27]:
reverse_de <- 1/DENSITY
plot_ly(data, x = ~reverse_de, y = ~BODYFAT, type= "scatter", mode = "markers")
m0 <- lm(BODYFAT ~ reverse_de)
In [28]:
plot(m0, which = 1)
In [29]:
data[96,]
round(495/DENSITY[96]- 450, 2)
BODYFATDENSITYAGEWEIGHTHEIGHTADIPOSITYNECKCHESTABDOMENHIPTHIGHKNEEANKLEBICEPSFOREARMWRIST
9617.3 1.099153 224.5 77.75 26.1 41.1 113.2 99.2 107.5 61.7 42.3 23.2 32.9 30.8 20.4
0.37
In [30]:
data[c(76, 24),]
round(495/DENSITY[76]- 450, 2)
BODYFATDENSITYAGEWEIGHTHEIGHTADIPOSITYNECKCHESTABDOMENHIPTHIGHKNEEANKLEBICEPSFOREARMWRIST
7618.3 1.066661 148.2567.5 22.9 36.0 91.6 81.8 94.8 54.5 37.0 21.4 29.3 27.0 18.3
2417.6 1.058432 148.7570.0 21.4 35.5 86.7 80.0 93.4 54.9 36.2 22.1 29.8 26.7 17.1
14.09
In [31]:
data[c(48, 24),]
round(495/DENSITY[48]- 450, 2)
data$BODYFAT[48] <- round(495/data$DENSITY[48]- 450, digits = 1)
BODYFATDENSITYAGEWEIGHTHEIGHTADIPOSITYNECKCHESTABDOMENHIPTHIGHKNEEANKLEBICEPSFOREARMWRIST
48 6.4 1.066539 148.5071.25 20.6 34.6 89.8 79.5 92.7 52.7 37.5 21.9 28.8 26.8 17.9
2417.6 1.058432 148.7570.00 21.4 35.5 86.7 80.0 93.4 54.9 36.2 22.1 29.8 26.7 17.1
14.14
In [32]:
We decided to use the BODYFAT value calculated with DENSITY.
Error in parse(text = x, srcfile = src): <text>:1:4: unexpected symbol
1: We decided
       ^
Traceback:

Consistence of BMI versus WEIGHT & HEIGHT

In [ ]:
BMI <- (WEIGHT/2.2046226218)/((HEIGHT*0.0254)^2)
boxplot(BMI - ADIPOSITY)
which(abs(BMI - data$ADIPOSITY) > 1)
In [ ]:
data[which(WEIGHT > 183 & WEIGHT < 185 & HEIGHT > 67 & HEIGHT < 69), ]
In [33]:
data[which(WEIGHT > 153 & WEIGHT < 155 & HEIGHT > 69 & HEIGHT < 71), ]
BODYFATDENSITYAGEWEIGHTHEIGHTADIPOSITYNECKCHESTABDOMENHIPTHIGHKNEEANKLEBICEPSFOREARMWRIST
218 8.2 1.081951 154.5070.00 22.2 36.9 93.3 81.5 94.4 54.7 39.0 22.6 27.5 25.9 18.6
22015.1 1.064653 154.5069.25 22.7 37.6 93.9 88.7 94.5 53.7 36.2 22.0 28.5 25.7 17.1
22112.7 1.070654 153.2570.50 24.5 38.5 99.0 91.8 96.2 57.7 38.1 23.9 31.4 29.9 18.9

We conclude that the record of 163, 221 may be not valid.

In [34]:
bad <- c(bad, 221, 163)

detach(data)
data <- data[-bad, -2]
attach(data)
m1 <- lm(BODYFAT ~ ., data)
m_null <- lm(BODYFAT ~ 1, data)

Variable Selection

Method Selected Varibles
BIC Backward WEIGHT, ABDOMEN, FOREARM, WRIST
BIC Forward & Both ABDOMEN, WEIGHT
AIC Backward 10 variables
AICForward & Both 6 variables
Mallow's Cp 9 variables
LASSO 5 variables
In [35]:
### AIC
m_null <- lm(BODYFAT ~ 1, data)
m_AIC_back <- step(m1, k=2)
m_AIC_for <- step(m_null, direction="forward",
                       scope=list(lower=~1,upper=m1))
m_AIC_both <- step(m_null, direction="both",
                        scope=list(lower=~1, upper=m1))  # the selected models seem to complicated

m_BIC_back <- step(m1, k=log(nrow(data)-1)) # WEIGHT, ABDOMEN, FOREARM, WRIST
m_BIC_for <- step(m_null, direction="forward", 
                       scope=list(lower=~1,upper=m1), k=log(nrow(data)-1))  # WEIGHT, ABDOMEN
m_BIC_both <- step(m_null, direction="both",
                        scope=list(lower=~1,upper= m1), k=log(nrow(data)-1))  # WEIGHT, ABDOMEN
m2 <- m_BIC_both # keep only ABDOMEN, WEIGHT
(s2 <- summary(m2))
(mse2 <- sum((s2$residuals)^2)/nrow(data))
# round the model to make it eaasier to calculate
fit <- -40 + ABDOMEN - 0.2*WEIGHT 
mse <- sum((fit - BODYFAT)^2)/nrow(data)
res <- fit - BODYFAT

m3 <- lm(BODYFAT ~ ABDOMEN)
summary(m3)

m4 <- lm((BODYFAT)*WEIGHT ~ ABDOMEN + WEIGHT, data) # transform
summary(m4)
(mse4 <- sum((m4$residuals/WEIGHT)^2)/nrow(data)) # worse

m5 <- lm(BODYFAT ~ WEIGHT + ABDOMEN + FOREARM + WRIST, data) # WEIGHT, ABDOMEN, FOREARM, WRIST  
(s5 <- summary(m5))
(mse5 <- sum((s5$residuals)^2)/nrow(data))
# round the model
fit5 <- -35 -0.15 * WEIGHT + ABDOMEN + 0.4*FOREARM - WRIST 
mse5 <- sum((fit5- BODYFAT)^2)/nrow(data) # harder to round
Start:  AIC=693.62
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST + 
    ABDOMEN + HIP + THIGH + KNEE + ANKLE + BICEPS + FOREARM + 
    WRIST

            Df Sum of Sq    RSS    AIC
- KNEE       1      0.85 3603.1 691.68
- ANKLE      1      5.32 3607.6 691.99
- CHEST      1      8.02 3610.3 692.17
- BICEPS     1     18.07 3620.3 692.86
<none>                   3602.2 693.62
- HIP        1     44.68 3646.9 694.68
- THIGH      1     44.92 3647.2 694.69
- NECK       1     47.70 3650.0 694.88
- AGE        1     55.57 3657.8 695.42
- FOREARM    1     61.67 3663.9 695.83
- HEIGHT     1     74.89 3677.1 696.72
- ADIPOSITY  1     78.71 3681.0 696.98
- WRIST      1    118.08 3720.3 699.62
- WEIGHT     1    127.21 3729.5 700.23
- ABDOMEN    1   1621.45 5223.7 783.79

Step:  AIC=691.68
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST + 
    ABDOMEN + HIP + THIGH + ANKLE + BICEPS + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- ANKLE      1      6.40 3609.5 690.12
- CHEST      1      8.02 3611.1 690.23
- BICEPS     1     17.71 3620.8 690.89
<none>                   3603.1 691.68
- HIP        1     43.92 3647.0 692.68
- NECK       1     49.52 3652.6 693.06
- THIGH      1     53.57 3656.7 693.34
- AGE        1     61.72 3664.8 693.89
- FOREARM    1     63.84 3666.9 694.03
- HEIGHT     1     74.13 3677.2 694.73
- ADIPOSITY  1     77.90 3681.0 694.98
- WRIST      1    117.27 3720.4 697.62
- WEIGHT     1    127.09 3730.2 698.27
- ABDOMEN    1   1624.66 5227.8 781.98

Step:  AIC=690.12
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST + 
    ABDOMEN + HIP + THIGH + BICEPS + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- CHEST      1      9.48 3619.0 688.77
- BICEPS     1     16.57 3626.1 689.25
<none>                   3609.5 690.12
- HIP        1     46.96 3656.5 691.32
- THIGH      1     54.52 3664.0 691.84
- NECK       1     55.99 3665.5 691.94
- AGE        1     59.98 3669.5 692.21
- FOREARM    1     62.80 3672.3 692.40
- HEIGHT     1     78.03 3687.5 693.42
- ADIPOSITY  1     83.51 3693.0 693.79
- WRIST      1    110.86 3720.4 695.62
- WEIGHT     1    126.05 3735.6 696.63
- ABDOMEN    1   1630.49 5240.0 780.56

Step:  AIC=688.77
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN + 
    HIP + THIGH + BICEPS + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- BICEPS     1     15.36 3634.3 687.82
<none>                   3619.0 688.77
- HIP        1     39.69 3658.7 689.47
- NECK       1     55.80 3674.8 690.56
- AGE        1     57.64 3676.6 690.69
- FOREARM    1     59.61 3678.6 690.82
- THIGH      1     67.89 3686.9 691.38
- HEIGHT     1     72.97 3692.0 691.72
- ADIPOSITY  1     75.54 3694.5 691.89
- WRIST      1    107.04 3726.0 694.00
- WEIGHT     1    127.77 3746.8 695.37
- ABDOMEN    1   1697.80 5316.8 782.17

Step:  AIC=687.82
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN + 
    HIP + THIGH + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
<none>                   3634.3 687.82
- HIP        1     45.32 3679.7 688.89
- NECK       1     51.85 3686.2 689.33
- AGE        1     61.89 3696.2 690.01
- HEIGHT     1     71.29 3705.6 690.64
- ADIPOSITY  1     76.31 3710.7 690.97
- FOREARM    1     86.71 3721.1 691.67
- THIGH      1     88.94 3723.3 691.82
- WRIST      1    105.48 3739.8 692.91
- WEIGHT     1    121.39 3755.7 693.97
- ABDOMEN    1   1684.57 5318.9 780.27
Start:  AIC=1008.41
BODYFAT ~ 1

            Df Sum of Sq     RSS     AIC
+ ABDOMEN    1    9402.4  4947.9  746.33
+ ADIPOSITY  1    7418.9  6931.4  829.94
+ CHEST      1    6908.6  7441.7  847.55
+ HIP        1    5401.9  8948.4  893.28
+ WEIGHT     1    5208.3  9142.0  898.59
+ THIGH      1    4279.5 10070.8  922.58
+ NECK       1    3491.6 10858.7  941.26
+ KNEE       1    3469.7 10880.6  941.76
+ BICEPS     1    3382.9 10967.4  943.73
+ FOREARM    1    1830.0 12520.3  976.58
+ WRIST      1    1803.3 12547.0  977.10
+ AGE        1    1220.1 13130.2  988.37
+ ANKLE      1     933.4 13416.9  993.73
<none>                   14350.3 1008.41
+ HEIGHT     1      14.8 14335.5 1010.15

Step:  AIC=746.33
BODYFAT ~ ABDOMEN

            Df Sum of Sq    RSS    AIC
+ WEIGHT     1    895.55 4052.3 698.82
+ WRIST      1    536.50 4411.4 719.87
+ HIP        1    530.59 4417.3 720.20
+ HEIGHT     1    496.32 4451.6 722.12
+ NECK       1    485.92 4462.0 722.70
+ KNEE       1    309.23 4638.6 732.33
+ ANKLE      1    198.93 4749.0 738.16
+ CHEST      1    189.11 4758.8 738.67
+ THIGH      1    175.01 4772.9 739.40
+ AGE        1    166.29 4781.6 739.86
+ BICEPS     1    117.79 4830.1 742.36
+ ADIPOSITY  1     73.16 4874.7 744.64
<none>                   4947.9 746.33
+ FOREARM    1     39.50 4908.4 746.35

Step:  AIC=698.82
BODYFAT ~ ABDOMEN + WEIGHT

            Df Sum of Sq    RSS    AIC
+ WRIST      1    83.797 3968.5 695.63
+ FOREARM    1    77.117 3975.2 696.05
+ THIGH      1    63.145 3989.2 696.92
+ BICEPS     1    59.956 3992.4 697.12
+ NECK       1    47.429 4004.9 697.90
<none>                   4052.3 698.82
+ ADIPOSITY  1     9.662 4042.7 700.23
+ KNEE       1     5.360 4047.0 700.49
+ AGE        1     3.241 4049.1 700.62
+ ANKLE      1     3.205 4049.1 700.62
+ HEIGHT     1     1.901 4050.4 700.70
+ HIP        1     0.951 4051.4 700.76
+ CHEST      1     0.424 4051.9 700.79

Step:  AIC=695.63
BODYFAT ~ ABDOMEN + WEIGHT + WRIST

            Df Sum of Sq    RSS    AIC
+ FOREARM    1   118.072 3850.5 690.14
+ BICEPS     1    75.584 3892.9 692.87
+ THIGH      1    35.522 3933.0 695.41
<none>                   3968.5 695.63
+ NECK       1    16.091 3952.4 696.63
+ ANKLE      1    13.164 3955.4 696.81
+ KNEE       1    12.548 3956.0 696.85
+ HIP        1     9.650 3958.9 697.03
+ ADIPOSITY  1     9.502 3959.0 697.04
+ AGE        1     7.230 3961.3 697.18
+ HEIGHT     1     1.639 3966.9 697.53
+ CHEST      1     0.033 3968.5 697.63

Step:  AIC=690.14
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM

            Df Sum of Sq    RSS    AIC
+ NECK       1    35.873 3814.6 689.82
<none>                   3850.5 690.14
+ BICEPS     1    27.195 3823.3 690.39
+ THIGH      1    24.974 3825.5 690.53
+ ANKLE      1    16.850 3833.6 691.06
+ AGE        1    16.718 3833.7 691.07
+ KNEE       1    12.211 3838.2 691.36
+ HIP        1     3.397 3847.1 691.93
+ ADIPOSITY  1     2.466 3848.0 691.99
+ CHEST      1     2.398 3848.1 691.99
+ HEIGHT     1     0.006 3850.4 692.14

Step:  AIC=689.82
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM + NECK

            Df Sum of Sq    RSS    AIC
+ BICEPS     1    36.156 3778.4 689.46
<none>                   3814.6 689.82
+ THIGH      1    23.638 3790.9 690.28
+ AGE        1    23.071 3791.5 690.32
+ ANKLE      1    10.712 3803.9 691.13
+ HIP        1     9.020 3805.6 691.24
+ ADIPOSITY  1     6.590 3808.0 691.39
+ KNEE       1     6.483 3808.1 691.40
+ HEIGHT     1     1.402 3813.2 691.73
+ CHEST      1     0.755 3813.8 691.77

Step:  AIC=689.46
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM + NECK + BICEPS

            Df Sum of Sq    RSS    AIC
<none>                   3778.4 689.46
+ AGE        1   24.5397 3753.9 689.85
+ ANKLE      1   12.1804 3766.2 690.66
+ THIGH      1   12.0702 3766.4 690.67
+ HIP        1   10.6467 3767.8 690.76
+ KNEE       1    7.0352 3771.4 691.00
+ ADIPOSITY  1    1.8596 3776.6 691.34
+ CHEST      1    1.4331 3777.0 691.37
+ HEIGHT     1    0.0096 3778.4 691.46
Start:  AIC=1008.41
BODYFAT ~ 1

            Df Sum of Sq     RSS     AIC
+ ABDOMEN    1    9402.4  4947.9  746.33
+ ADIPOSITY  1    7418.9  6931.4  829.94
+ CHEST      1    6908.6  7441.7  847.55
+ HIP        1    5401.9  8948.4  893.28
+ WEIGHT     1    5208.3  9142.0  898.59
+ THIGH      1    4279.5 10070.8  922.58
+ NECK       1    3491.6 10858.7  941.26
+ KNEE       1    3469.7 10880.6  941.76
+ BICEPS     1    3382.9 10967.4  943.73
+ FOREARM    1    1830.0 12520.3  976.58
+ WRIST      1    1803.3 12547.0  977.10
+ AGE        1    1220.1 13130.2  988.37
+ ANKLE      1     933.4 13416.9  993.73
<none>                   14350.3 1008.41
+ HEIGHT     1      14.8 14335.5 1010.15

Step:  AIC=746.33
BODYFAT ~ ABDOMEN

            Df Sum of Sq     RSS     AIC
+ WEIGHT     1     895.6  4052.3  698.82
+ WRIST      1     536.5  4411.4  719.87
+ HIP        1     530.6  4417.3  720.20
+ HEIGHT     1     496.3  4451.6  722.12
+ NECK       1     485.9  4462.0  722.70
+ KNEE       1     309.2  4638.6  732.33
+ ANKLE      1     198.9  4749.0  738.16
+ CHEST      1     189.1  4758.8  738.67
+ THIGH      1     175.0  4772.9  739.40
+ AGE        1     166.3  4781.6  739.86
+ BICEPS     1     117.8  4830.1  742.36
+ ADIPOSITY  1      73.2  4874.7  744.64
<none>                    4947.9  746.33
+ FOREARM    1      39.5  4908.4  746.35
- ABDOMEN    1    9402.4 14350.3 1008.41

Step:  AIC=698.82
BODYFAT ~ ABDOMEN + WEIGHT

            Df Sum of Sq    RSS    AIC
+ WRIST      1      83.8 3968.5 695.63
+ FOREARM    1      77.1 3975.2 696.05
+ THIGH      1      63.1 3989.2 696.92
+ BICEPS     1      60.0 3992.4 697.12
+ NECK       1      47.4 4004.9 697.90
<none>                   4052.3 698.82
+ ADIPOSITY  1       9.7 4042.7 700.23
+ KNEE       1       5.4 4047.0 700.49
+ AGE        1       3.2 4049.1 700.62
+ ANKLE      1       3.2 4049.1 700.62
+ HEIGHT     1       1.9 4050.4 700.70
+ HIP        1       1.0 4051.4 700.76
+ CHEST      1       0.4 4051.9 700.79
- WEIGHT     1     895.6 4947.9 746.33
- ABDOMEN    1    5089.7 9142.0 898.59

Step:  AIC=695.63
BODYFAT ~ ABDOMEN + WEIGHT + WRIST

            Df Sum of Sq    RSS    AIC
+ FOREARM    1     118.1 3850.5 690.14
+ BICEPS     1      75.6 3892.9 692.87
+ THIGH      1      35.5 3933.0 695.41
<none>                   3968.5 695.63
+ NECK       1      16.1 3952.4 696.63
+ ANKLE      1      13.2 3955.4 696.81
+ KNEE       1      12.5 3956.0 696.85
+ HIP        1       9.6 3958.9 697.03
+ ADIPOSITY  1       9.5 3959.0 697.04
+ AGE        1       7.2 3961.3 697.18
+ HEIGHT     1       1.6 3966.9 697.53
+ CHEST      1       0.0 3968.5 697.63
- WRIST      1      83.8 4052.3 698.82
- WEIGHT     1     442.9 4411.4 719.87
- ABDOMEN    1    4916.4 8884.9 893.51

Step:  AIC=690.14
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM

            Df Sum of Sq    RSS    AIC
+ NECK       1      35.9 3814.6 689.82
<none>                   3850.5 690.14
+ BICEPS     1      27.2 3823.3 690.39
+ THIGH      1      25.0 3825.5 690.53
+ ANKLE      1      16.8 3833.6 691.06
+ AGE        1      16.7 3833.7 691.07
+ KNEE       1      12.2 3838.2 691.36
+ HIP        1       3.4 3847.1 691.93
+ ADIPOSITY  1       2.5 3848.0 691.99
+ CHEST      1       2.4 3848.1 691.99
+ HEIGHT     1       0.0 3850.4 692.14
- FOREARM    1     118.1 3968.5 695.63
- WRIST      1     124.8 3975.2 696.05
- WEIGHT     1     551.4 4401.8 721.34
- ABDOMEN    1    5034.5 8884.9 895.51

Step:  AIC=689.82
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM + NECK

            Df Sum of Sq    RSS    AIC
+ BICEPS     1      36.2 3778.4 689.46
<none>                   3814.6 689.82
- NECK       1      35.9 3850.5 690.14
+ THIGH      1      23.6 3790.9 690.28
+ AGE        1      23.1 3791.5 690.32
+ ANKLE      1      10.7 3803.9 691.13
+ HIP        1       9.0 3805.6 691.24
+ ADIPOSITY  1       6.6 3808.0 691.39
+ KNEE       1       6.5 3808.1 691.40
+ HEIGHT     1       1.4 3813.2 691.73
+ CHEST      1       0.8 3813.8 691.77
- WRIST      1      77.1 3891.7 692.78
- FOREARM    1     137.9 3952.4 696.63
- WEIGHT     1     417.6 4232.2 713.59
- ABDOMEN    1    5059.6 8874.1 897.21

Step:  AIC=689.46
BODYFAT ~ ABDOMEN + WEIGHT + WRIST + FOREARM + NECK + BICEPS

            Df Sum of Sq    RSS    AIC
<none>                   3778.4 689.46
- BICEPS     1      36.2 3814.6 689.82
+ AGE        1      24.5 3753.9 689.85
- NECK       1      44.8 3823.3 690.39
+ ANKLE      1      12.2 3766.2 690.66
+ THIGH      1      12.1 3766.4 690.67
+ HIP        1      10.6 3767.8 690.76
+ KNEE       1       7.0 3771.4 691.00
+ ADIPOSITY  1       1.9 3776.6 691.34
+ CHEST      1       1.4 3777.0 691.37
+ HEIGHT     1       0.0 3778.4 691.46
- WRIST      1      75.8 3854.2 692.39
- FOREARM    1      82.8 3861.2 692.84
- WEIGHT     1     451.5 4229.9 715.46
- ABDOMEN    1    5090.1 8868.5 899.05
Start:  AIC=746.26
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST + 
    ABDOMEN + HIP + THIGH + KNEE + ANKLE + BICEPS + FOREARM + 
    WRIST

            Df Sum of Sq    RSS    AIC
- KNEE       1      0.85 3603.1 740.81
- ANKLE      1      5.32 3607.6 741.12
- CHEST      1      8.02 3610.3 741.30
- BICEPS     1     18.07 3620.3 741.99
- HIP        1     44.68 3646.9 743.81
- THIGH      1     44.92 3647.2 743.82
- NECK       1     47.70 3650.0 744.01
- AGE        1     55.57 3657.8 744.55
- FOREARM    1     61.67 3663.9 744.96
- HEIGHT     1     74.89 3677.1 745.85
- ADIPOSITY  1     78.71 3681.0 746.11
<none>                   3602.2 746.26
- WRIST      1    118.08 3720.3 748.75
- WEIGHT     1    127.21 3729.5 749.36
- ABDOMEN    1   1621.45 5223.7 832.92

Step:  AIC=740.81
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST + 
    ABDOMEN + HIP + THIGH + ANKLE + BICEPS + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- ANKLE      1      6.40 3609.5 735.74
- CHEST      1      8.02 3611.1 735.85
- BICEPS     1     17.71 3620.8 736.52
- HIP        1     43.92 3647.0 738.30
- NECK       1     49.52 3652.6 738.69
- THIGH      1     53.57 3656.7 738.96
- AGE        1     61.72 3664.8 739.51
- FOREARM    1     63.84 3666.9 739.66
- HEIGHT     1     74.13 3677.2 740.35
- ADIPOSITY  1     77.90 3681.0 740.61
<none>                   3603.1 740.81
- WRIST      1    117.27 3720.4 743.24
- WEIGHT     1    127.09 3730.2 743.90
- ABDOMEN    1   1624.66 5227.8 827.60

Step:  AIC=735.74
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + CHEST + 
    ABDOMEN + HIP + THIGH + BICEPS + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- CHEST      1      9.48 3619.0 730.88
- BICEPS     1     16.57 3626.1 731.37
- HIP        1     46.96 3656.5 733.44
- THIGH      1     54.52 3664.0 733.95
- NECK       1     55.99 3665.5 734.05
- AGE        1     59.98 3669.5 734.32
- FOREARM    1     62.80 3672.3 734.51
- HEIGHT     1     78.03 3687.5 735.54
<none>                   3609.5 735.74
- ADIPOSITY  1     83.51 3693.0 735.90
- WRIST      1    110.86 3720.4 737.73
- WEIGHT     1    126.05 3735.6 738.74
- ABDOMEN    1   1630.49 5240.0 822.67

Step:  AIC=730.88
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN + 
    HIP + THIGH + BICEPS + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- BICEPS     1     15.36 3634.3 726.42
- HIP        1     39.69 3658.7 728.08
- NECK       1     55.80 3674.8 729.17
- AGE        1     57.64 3676.6 729.29
- FOREARM    1     59.61 3678.6 729.42
- THIGH      1     67.89 3686.9 729.98
- HEIGHT     1     72.97 3692.0 730.32
- ADIPOSITY  1     75.54 3694.5 730.50
<none>                   3619.0 730.88
- WRIST      1    107.04 3726.0 732.60
- WEIGHT     1    127.77 3746.8 733.98
- ABDOMEN    1   1697.80 5316.8 820.77

Step:  AIC=726.42
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN + 
    HIP + THIGH + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- HIP        1     45.32 3679.7 723.99
- NECK       1     51.85 3686.2 724.43
- AGE        1     61.89 3696.2 725.10
- HEIGHT     1     71.29 3705.6 725.73
- ADIPOSITY  1     76.31 3710.7 726.07
<none>                   3634.3 726.42
- FOREARM    1     86.71 3721.1 726.76
- THIGH      1     88.94 3723.3 726.91
- WRIST      1    105.48 3739.8 728.01
- WEIGHT     1    121.39 3755.7 729.06
- ABDOMEN    1   1684.57 5318.9 815.36

Step:  AIC=723.99
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + NECK + ABDOMEN + 
    THIGH + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- NECK       1     35.43 3715.1 720.85
- THIGH      1     55.02 3734.7 722.16
- HEIGHT     1     58.57 3738.2 722.39
- ADIPOSITY  1     60.18 3739.8 722.50
- AGE        1     67.16 3746.8 722.96
<none>                   3679.7 723.99
- WRIST      1    103.49 3783.2 725.36
- FOREARM    1    114.73 3794.4 726.09
- WEIGHT     1    126.42 3806.1 726.85
- ABDOMEN    1   1645.63 5325.3 810.15

Step:  AIC=720.85
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + ABDOMEN + THIGH + 
    FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- THIGH      1     58.02 3773.1 719.19
- AGE        1     58.75 3773.8 719.24
- ADIPOSITY  1     66.48 3781.6 719.74
- HEIGHT     1     69.10 3784.2 719.91
<none>                   3715.1 720.85
- FOREARM    1     96.17 3811.3 721.68
- WRIST      1    141.67 3856.8 724.63
- WEIGHT     1    154.22 3869.3 725.43
- ABDOMEN    1   1672.83 5387.9 807.54

Step:  AIC=719.19
BODYFAT ~ AGE + WEIGHT + HEIGHT + ADIPOSITY + ABDOMEN + FOREARM + 
    WRIST

            Df Sum of Sq    RSS    AIC
- AGE        1     26.76 3799.9 715.43
- HEIGHT     1     57.02 3830.1 717.40
- ADIPOSITY  1     60.54 3833.7 717.63
<none>                   3773.1 719.19
- FOREARM    1    100.23 3873.3 720.18
- WEIGHT     1    124.31 3897.4 721.72
- WRIST      1    151.38 3924.5 723.43
- ABDOMEN    1   1694.54 5467.7 805.67

Step:  AIC=715.43
BODYFAT ~ WEIGHT + HEIGHT + ADIPOSITY + ABDOMEN + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- HEIGHT     1     48.11 3848.0 713.04
- ADIPOSITY  1     50.57 3850.4 713.20
<none>                   3799.9 715.43
- FOREARM    1     91.26 3891.1 715.81
- WEIGHT     1    123.97 3923.8 717.88
- WRIST      1    125.19 3925.1 717.96
- ABDOMEN    1   2662.46 6462.3 841.61

Step:  AIC=713.04
BODYFAT ~ WEIGHT + ADIPOSITY + ABDOMEN + FOREARM + WRIST

            Df Sum of Sq    RSS    AIC
- ADIPOSITY  1      2.47 3850.5 707.69
<none>                   3848.0 713.04
- FOREARM    1    111.04 3959.0 714.59
- WRIST      1    123.52 3971.5 715.37
- WEIGHT     1    528.40 4376.4 739.44
- ABDOMEN    1   2833.77 6681.8 844.39

Step:  AIC=707.69
BODYFAT ~ WEIGHT + ABDOMEN + FOREARM + WRIST

          Df Sum of Sq    RSS    AIC
<none>                 3850.5 707.69
- FOREARM  1     118.1 3968.5 709.67
- WRIST    1     124.8 3975.2 710.09
- WEIGHT   1     551.4 4401.8 735.37
- ABDOMEN  1    5034.5 8884.9 909.55
Start:  AIC=1011.92
BODYFAT ~ 1

            Df Sum of Sq     RSS     AIC
+ ABDOMEN    1    9402.4  4947.9  753.35
+ ADIPOSITY  1    7418.9  6931.4  836.95
+ CHEST      1    6908.6  7441.7  854.57
+ HIP        1    5401.9  8948.4  900.30
+ WEIGHT     1    5208.3  9142.0  905.61
+ THIGH      1    4279.5 10070.8  929.60
+ NECK       1    3491.6 10858.7  948.28
+ KNEE       1    3469.7 10880.6  948.78
+ BICEPS     1    3382.9 10967.4  950.75
+ FOREARM    1    1830.0 12520.3  983.59
+ WRIST      1    1803.3 12547.0  984.12
+ AGE        1    1220.1 13130.2  995.39
+ ANKLE      1     933.4 13416.9 1000.75
<none>                   14350.3 1011.92
+ HEIGHT     1      14.8 14335.5 1017.17

Step:  AIC=753.35
BODYFAT ~ ABDOMEN

            Df Sum of Sq    RSS    AIC
+ WEIGHT     1    895.55 4052.3 709.35
+ WRIST      1    536.50 4411.4 730.40
+ HIP        1    530.59 4417.3 730.73
+ HEIGHT     1    496.32 4451.6 732.65
+ NECK       1    485.92 4462.0 733.23
+ KNEE       1    309.23 4638.6 742.86
+ ANKLE      1    198.93 4749.0 748.69
+ CHEST      1    189.11 4758.8 749.20
+ THIGH      1    175.01 4772.9 749.93
+ AGE        1    166.29 4781.6 750.38
+ BICEPS     1    117.79 4830.1 752.89
<none>                   4947.9 753.35
+ ADIPOSITY  1     73.16 4874.7 755.17
+ FOREARM    1     39.50 4908.4 756.88

Step:  AIC=709.35
BODYFAT ~ ABDOMEN + WEIGHT

            Df Sum of Sq    RSS    AIC
<none>                   4052.3 709.35
+ WRIST      1    83.797 3968.5 709.67
+ FOREARM    1    77.117 3975.2 710.09
+ THIGH      1    63.145 3989.2 710.96
+ BICEPS     1    59.956 3992.4 711.16
+ NECK       1    47.429 4004.9 711.93
+ ADIPOSITY  1     9.662 4042.7 714.26
+ KNEE       1     5.360 4047.0 714.53
+ AGE        1     3.241 4049.1 714.66
+ ANKLE      1     3.205 4049.1 714.66
+ HEIGHT     1     1.901 4050.4 714.74
+ HIP        1     0.951 4051.4 714.80
+ CHEST      1     0.424 4051.9 714.83
Start:  AIC=1011.92
BODYFAT ~ 1

            Df Sum of Sq     RSS     AIC
+ ABDOMEN    1    9402.4  4947.9  753.35
+ ADIPOSITY  1    7418.9  6931.4  836.95
+ CHEST      1    6908.6  7441.7  854.57
+ HIP        1    5401.9  8948.4  900.30
+ WEIGHT     1    5208.3  9142.0  905.61
+ THIGH      1    4279.5 10070.8  929.60
+ NECK       1    3491.6 10858.7  948.28
+ KNEE       1    3469.7 10880.6  948.78
+ BICEPS     1    3382.9 10967.4  950.75
+ FOREARM    1    1830.0 12520.3  983.59
+ WRIST      1    1803.3 12547.0  984.12
+ AGE        1    1220.1 13130.2  995.39
+ ANKLE      1     933.4 13416.9 1000.75
<none>                   14350.3 1011.92
+ HEIGHT     1      14.8 14335.5 1017.17

Step:  AIC=753.35
BODYFAT ~ ABDOMEN

            Df Sum of Sq     RSS     AIC
+ WEIGHT     1     895.6  4052.3  709.35
+ WRIST      1     536.5  4411.4  730.40
+ HIP        1     530.6  4417.3  730.73
+ HEIGHT     1     496.3  4451.6  732.65
+ NECK       1     485.9  4462.0  733.23
+ KNEE       1     309.2  4638.6  742.86
+ ANKLE      1     198.9  4749.0  748.69
+ CHEST      1     189.1  4758.8  749.20
+ THIGH      1     175.0  4772.9  749.93
+ AGE        1     166.3  4781.6  750.38
+ BICEPS     1     117.8  4830.1  752.89
<none>                    4947.9  753.35
+ ADIPOSITY  1      73.2  4874.7  755.17
+ FOREARM    1      39.5  4908.4  756.88
- ABDOMEN    1    9402.4 14350.3 1011.92

Step:  AIC=709.35
BODYFAT ~ ABDOMEN + WEIGHT

            Df Sum of Sq    RSS    AIC
<none>                   4052.3 709.35
+ WRIST      1      83.8 3968.5 709.67
+ FOREARM    1      77.1 3975.2 710.09
+ THIGH      1      63.1 3989.2 710.96
+ BICEPS     1      60.0 3992.4 711.16
+ NECK       1      47.4 4004.9 711.93
+ ADIPOSITY  1       9.7 4042.7 714.26
+ KNEE       1       5.4 4047.0 714.53
+ AGE        1       3.2 4049.1 714.66
+ ANKLE      1       3.2 4049.1 714.66
+ HEIGHT     1       1.9 4050.4 714.74
+ HIP        1       1.0 4051.4 714.80
+ CHEST      1       0.4 4051.9 714.83
- WEIGHT     1     895.6 4947.9 753.35
- ABDOMEN    1    5089.7 9142.0 905.61
Call:
lm(formula = BODYFAT ~ ABDOMEN + WEIGHT, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.1200  -2.9909   0.0619   2.8762   9.6250 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -40.43036    2.40269 -16.827  < 2e-16 ***
ABDOMEN       0.91448    0.05213  17.542  < 2e-16 ***
WEIGHT       -0.14075    0.01913  -7.358  2.8e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.067 on 245 degrees of freedom
Multiple R-squared:  0.7176,	Adjusted R-squared:  0.7153 
F-statistic: 311.3 on 2 and 245 DF,  p-value: < 2.2e-16
16.340017502577
Call:
lm(formula = BODYFAT ~ ABDOMEN)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.1111  -3.4634  -0.0184   2.9368  11.9150 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -34.14018    2.47618  -13.79   <2e-16 ***
ABDOMEN       0.57428    0.02656   21.62   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.485 on 246 degrees of freedom
Multiple R-squared:  0.6552,	Adjusted R-squared:  0.6538 
F-statistic: 467.5 on 1 and 246 DF,  p-value: < 2.2e-16
Call:
lm(formula = (BODYFAT) * WEIGHT ~ ABDOMEN + WEIGHT, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1966.83  -520.24   -20.95   532.37  2120.13 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -11246.016    425.166  -26.45   <2e-16 ***
ABDOMEN        169.216      9.225   18.34   <2e-16 ***
WEIGHT          -4.908      3.385   -1.45    0.148    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 719.7 on 245 degrees of freedom
Multiple R-squared:  0.8478,	Adjusted R-squared:  0.8466 
F-statistic: 682.5 on 2 and 245 DF,  p-value: < 2.2e-16
16.4540563347311
Call:
lm(formula = BODYFAT ~ WEIGHT + ABDOMEN + FOREARM + WRIST, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.1333  -2.7686  -0.1523   2.9328   8.2002 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -33.97123    6.84438  -4.963 1.30e-06 ***
WEIGHT       -0.13658    0.02315  -5.899 1.22e-08 ***
ABDOMEN       0.92457    0.05187  17.825  < 2e-16 ***
FOREARM       0.45735    0.16754   2.730  0.00680 ** 
WRIST        -1.16568    0.41544  -2.806  0.00542 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.981 on 243 degrees of freedom
Multiple R-squared:  0.7317,	Adjusted R-squared:  0.7273 
F-statistic: 165.7 on 4 and 243 DF,  p-value: < 2.2e-16
15.5260274612302
In [36]:
anova(m5, m1)
Res.DfRSSDfSum of SqFPr(>F)
243 3850.455 NA NA NA NA
233 3602.249 10 248.2056 1.605439 0.1058225

We did ANOVA test on Full model versus BIC backward model (4 variables), the null hypothesis was retained.

In [37]:
anova(m2, m5)
Res.DfRSSDfSum of SqFPr(>F)
245 4052.324 NA NA NA NA
243 3850.455 2 201.8695 6.369935 0.00201211

We did ANOVA test on BIC backward model (4 variables) versus BIC forward model (2 variables), the null hypothesis was rejected.

We decided to reserve the two models mentioned above for further improvement.

Model Diagnosis

Two-variables Models

This model passed all of the normality test, multi-collinearity test and homoscedasticity test.

In [ ]:
Test Linearity:
In [40]:
crPlots(m2)
In [42]:
bc <- boxcox(WEIGHT~1, data = data, lambda = seq(-10, 10, length = 10))
trans <- bc$x[which.max(bc$y)]
W2 <- WEIGHT^trans
mt<- lm(BODYFAT ~ ABDOMEN + WEIGHT + W2)
15.6
In [43]:
par(mfrow = c(2,2))
plot(mt)
par(mfrow = c(1,1))
In [60]:
crPlots(mt)

Four-variables Models

This model passed all of the normality test, multi-collinearity test and homoscedasticity test.

In [50]:
crPlots(m5)
In [51]:
mt_2 = lm(BODYFAT~WEIGHT+W2+ABDOMEN+FOREARM+WRIST, data)

Linearity Test

In [53]:
crPlots(mt_2)

Normality Test

In [54]:
shapiro.test(mt_2$residuals)
	Shapiro-Wilk normality test

data:  mt_2$residuals
W = 0.9922, p-value = 0.2145
In [67]:
new <- data.frame(BODYFAT = BODYFAT, final = mt_2$fitted.value)
orders = order(new$BODYFAT)
plot_ly(new) %>%
  add_trace(x = ~1:248, y = ~new$BODYFAT[orders], type = "scatter", color = "True", 
            marker = list(color = '#E69F00'), mode = "marker") %>%
  add_trace(x = ~1:248, y = ~new$final[orders], type = "scatter", color = "Estimated", 
            marker = list(color = '#56B4E9'),mode = "marker") %>%
  layout(title = 'True BodyFat Values v.s. Estimated',
         xaxis = list(title = "Index in Increasing Ordered"),
         yaxis = list(title = "BodyFat"))
Warning message in RColorBrewer::brewer.pal(N, "Set2"):
“minimal value for n is 3, returning requested palette with 3 different levels
”A marker object has been specified, but markers is not in the mode
Adding markers to the mode...
A marker object has been specified, but markers is not in the mode
Adding markers to the mode...

Multi-collinearity Test

In [55]:
vif(mt_2)
WEIGHT
26.8021559233496
W2
26.9749323027085
ABDOMEN
4.91591999617982
FOREARM
1.9818092129003
WRIST
2.3417450044006
In [ ]:
Homoscedasticity Test
In [56]:
ncvTest(mt_2)
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 0.5363876    Df = 1     p = 0.4639337 

Proposed SLR Model

$$BodyFat\ \% = 50.183 - 0.2571Weight - 720.904Weight^{-0.505} + 0.904Abdomen + 0.279Forearm - 1.308Wrist$$

Strengths and Weaknesses of the Analysis

The model is a reasonable model between body fat % and abdomen, weight.

Moreover, the model has the following strengths and advantages:

  1. Linearity: seems reasonable. According the Component & Partial Residual plot and model residule v.s. fitted value plot.
  2. Explanatory variables: reasonable since it's easy to think one’s body fat with his weight and abdomen. Because intuitively, if a man has a big abdomen, he tends to be fatter. Meanwhile, if his weight is heavier given other body measurements fixed, he is likely to be more muscled, since fat has less density. Besides, the variable forearm and wrist can be treated as a measurement of body frame size which will also contribute to body fat's calculation
  3. Constant effects: reasonable because it is not affected by age or non-body factors.
  4. Normally distributed errors: seems reasonable from the Q-Q plot diagnostic and also the model passed Shapiro-Wilk’s test.
  5. Constant variance: the model passed the score test for non-constant error variance

Overall, our model provides a relatively simple way of predict the body fat % purely based on weight, forearm, wrist and abdomen.

Last but not least, there still exists some potential weaknesses or questions.

  1. Should there be non-linear relationships? We only fit linear models and the true relations could be more complicated.
  2. How to solve the multicollinearity between circumferences and other variables?
  3. Could the conclusion also be used for women? The data was only collected for men, and therefore the model is only suitable for men. Does there exist a general formula for both men and women?

Rule of Thumb

  • "multiply your forearm circumference (cm) by 0.5, add the difference of your abdomen circumference (cm) and wrist circumference (cm), minus one length of your weight (lb) and minus 35"
  • "multiply your weight (lb) by 0.2, add your abdomen (cm) and minus 40"

Example Usage:

for a 170lbs man with abdomen circumference about 90 cm, forearm circumference about 28 cm and wrist circumference about 17 cm, his predicted body fat % percentage would be around 19.55%. There is a 95% probability that his body fat is between 18.54% and 20.55%.

With the rule of thumb, you get about 16% as the predicted body fat %.